A subsampled double bootstrap for massive data

نویسندگان

  • Srijan Sengupta
  • Stanislav Volgushev
  • Xiaofeng Shao
چکیده

The bootstrap is a popular and powerful method for assessing precision of estimators and inferential methods. However, for massive datasets which are increasingly prevalent, the bootstrap becomes prohibitively costly in computation and its feasibility is questionable even with modern parallel computing platforms. Recently Kleiner, Talwalkar, Sarkar, and Jordan (2014) proposed a method called BLB (Bag of Little Bootstraps) for massive data which is more computationally scalable with little sacrifice of statistical accuracy. Building on BLB and the idea of fast double bootstrap, we propose a new resampling method, the subsampled double bootstrap, for both independent data and time series data. We establish consistency of the subsampled double bootstrap under mild conditions for both independent and dependent cases. Methodologically, the subsampled double bootstrap is superior to BLB in terms of running time, more sample coverage and automatic implementation with less tuning parameters for a given time budget. Its advantage relative to BLB and bootstrap is also demonstrated in numerical simulations and a data illustration.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SFB 823 A subsampled double bootstrap for massive data

The bootstrap is a popular and powerful method for assessing precision of estimators and inferential methods. However, for massive datasets which are increasingly prevalent, the bootstrap becomes prohibitively costly in computation and its feasibility is questionable even with modern parallel computing platforms. Recently Kleiner, Talwalkar, Sarkar, and Jordan (2014) proposed a method called BL...

متن کامل

A scalable bootstrap for massive data

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets—which are increasingly prevalent— the computation of bootstrap-based quantities can be prohibitively demanding computationally. While variants such as subsampling and the m out of n bootstrap can be used in principle to reduce the cost of bootstrap computation...

متن کامل

Improving the reliability of bootstrap tests with the fast double bootstrap

Two procedures are proposed for estimating the rejection probabilities of bootstrap tests in Monte Carlo experiments without actually computing a bootstrap test for each replication. These procedures are only about twice as expensive (per replication) as estimating rejection probabilities for asymptotic tests. Then a new procedure is proposed for computing bootstrap P values that will often be ...

متن کامل

Inferential Procedures Based on the Double Bootstrap for Log Logistic Regression Model with Censored Data

Traditional inferential procedures based on the asymptotic normality assumption such as the Wald often produce misleading inferences when dealing with censored data and small samples. Alternative estimation techniques such as the jackknife and bootstrap percentile allow us to construct the interval estimates without relying on any classical assumptions. Recently, the double bootstrap became pre...

متن کامل

Computational algorithms for double bootstrap confidence intervals

In some cases, such as in the estimation of impulse responses, it has been found that for plausible sample sizes the coverage accuracy of single bootstrap confidence intervals can be poor. The error in the coverage probability of single bootstrap confidence intervals may be reduced by the use of double bootstrap confidence intervals. The computer resources required for double bootstrap confiden...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015